Here I analyze spatial transcriptomics data using R and Seurat. Spatial transcriptomics combines gene expression measurements with spatial information, allowing us to understand how genes are expressed in different regions of a tissue.
What is Spatial Transcriptomics?
** Why is it Important?**
Point of Spatial Tx
Biological Insights Only Possible with Spatial Data
Tissue organization:
Pathological insights:
# Install packages if not already installed
# install.packages(c("Seurat", "Matrix", "ggplot2", "viridis", "patchwork", "dplyr"))
# install.packages(c("sp", "spdep", "spatstat", "pheatmap", "sctransform"))
# SeuratData::InstallData("stxBrain.SeuratData")
# devtools::install_github('satijalab/seurat-data')
# Load essential libraries
library(Seurat) # v5.3.1 - main spatial analysis package
library(SeuratData) # Example Data to Analyse
library(ggplot2) # for visualization
library(dplyr) # for data manipulation
library(Matrix) # for sparse matrices
library(viridis) # for color palettes
library(patchwork) # for combining plots
library(spdep) # for spatial statistics
library(pheatmap) # for heatmaps
library(sctransform) # for SCTransform normalization
library(future) # for parallel processing
library(knitr) # for visualising tables
# Set up parallel processing for faster computation
plan("multisession", workers = 4)
# Increase the memory limit for parallel processing
options(future.globals.maxSize = 8000 * 1024^2) # Set to 4GB
# Set seed for reproducibility
set.seed(42)| Name | stxBrain |
| Platform | 10X Genomics Visium technology |
| Data Type | Spatial transcriptomics |
| Species | Mouse |
| Tissue | Brain: 2 serial anterior sections, 2x serial posterior sections |
# Tissue slice to use
tissue="anterior2"
# Default Assay to Study
assayofInterest="Spatial"
# Load the stxBrain dataset
seurat_obj <- LoadData("stxBrain", type = tissue)
# Ensure we're working with the Spatial assay
DefaultAssay(seurat_obj) <- assayofInterest
# Preview of first 5 genes, first 5 spots
GetAssayData(seurat_obj, slot = "counts")[1:5, 1:5]## 5 x 5 sparse Matrix of class "dgCMatrix"
## AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1
## Xkr4 . . .
## Gm1992 . . .
## Gm37381 . . .
## Rp1 . . .
## Sox17 . . 1
## AAACAGCTTTCAGAAG-1 AAACAGGGTCTATATT-1
## Xkr4 . .
## Gm1992 . .
## Gm37381 . .
## Rp1 . .
## Sox17 . .
## [1] "orig.ident" "nCount_Spatial" "nFeature_Spatial" "slice"
## [5] "region"
# Ensure spatial coordinates are in metadata
spatial_coords <- GetTissueCoordinates(seurat_obj)
seurat_obj@meta.data$spatial_x <- spatial_coords$x
seurat_obj@meta.data$spatial_y <- spatial_coords$y# Check if spatial coordinates are available
cat(paste("Image dimensions:",
paste(dim(seurat_obj@images[[tissue]]@image), collapse=" x " ) ))## Image dimensions: 599 x 600 x 3
Explanation image dimensions are height x width x channels (typically RGB)
Spatial Coordinates also give you an idea of how large the usable area of the image is.
Coordinate ranges tell you:
Red flags:
Spatial coordinates guide for downstream planning:
# Display spatial information
if("images" %in% slotNames(seurat_obj)) {
# Get tissue coordinates
print(paste("Spatial coordinate range - X:", round(min(spatial_coords$x), 2), "to", round(max(spatial_coords$x), 2)))
print(paste("Spatial coordinate range - Y:", round(min(spatial_coords$y), 2), "to", round(max(spatial_coords$y), 2)))
}## [1] "Spatial coordinate range - X: 2219 to 10004"
## [1] "Spatial coordinate range - Y: 1578 to 10113"
Verdict: is sufficiently large.
# Sum up counts in each column / spot
seurat_obj@meta.data$nCount_Spatial <- Matrix::colSums(GetAssayData(seurat_obj, slot = "counts"))
# Sum up number of genes with non zero count in each column/spot
seurat_obj@meta.data$nFeature_Spatial <- Matrix::colSums(GetAssayData(seurat_obj, slot = "counts") > 0)
print(seurat_obj)## An object of class Seurat
## 31053 features across 2825 samples within 1 assay
## Active assay: Spatial (31053 features, 0 variable features)
## 1 layer present: counts
## 1 spatial field of view present: anterior2
These statistics tell you about the overall size and filtering effects of your dataset.
Number of features (genes): How many genes passed
your min.cells filter (genes expressed in ≥3 spots)
Number of samples (spots): How many spots passed
your min.features filter (spots with ≥200 genes)
## [1] 31053
## [1] 2825
Verdict: stxBrain is a deeply sequenced high quality spatial data. Therefore, high Number of features is not abnormal.
This shows the overall sequencing effort and depth across your entire dataset.
Total UMI counts: Overall sequencing depth across all spots
# Total UMI counts (formatted)
format(sum(GetAssayData(seurat_obj, slot = "counts"))
, big.mark = ",")## [1] "55,880,100"
Verdict: Mouse brain tissue typically shows moderate-to-high UMI counts, therefore, this count is not abnormal.
This measures the sequencing depth per individual spot - a key quality metric.
What it means: Average number of UMI (unique molecular identifiers) detected per spot
data.frame(
`**Mean**`= round(mean(seurat_obj@meta.data$nCount_Spatial), 2)
, `**Median**`= round(median(seurat_obj@meta.data$nCount_Spatial), 2)
, `**Min**`=min(seurat_obj@meta.data$nCount_Spatial)
, `**Max**`=max(seurat_obj@meta.data$nCount_Spatial)
, `**Standard deviation**`= round(sd(seurat_obj@meta.data$nCount_Spatial), 2)
, check.names = F
) %>% t() %>% knitr::kable()| Mean | 19780.57 |
| Median | 19274.00 |
| Min | 199.00 |
| Max | 58134.00 |
| Standard deviation | 8495.21 |
Verdict: We see high average UMI counts per spot. As long as this is highly correlated with average features per spot we can interpret this as stxBrain being a dense deeply sequenced dataset.
A spot with low UMI might indicate - Poor tissue quality at that location - Technical failure during library prep - Spot located on tissue edge or artifact
How to interpret Spots with <500 UMI - <5%: Excellent quality, minimal filtering needed - 5-15%: Good quality, standard filtering recommended - 15-30%: Moderate quality, careful filtering required - >30%: Poor quality dataset, investigate technical issues
How to interpret Spots with <500 UMI - <10%: Excellent quality - 10-20%: Good quality, standard filtering - 20-40%: Moderate quality, may need adjusted thresholds - >40%: Poor quality, consider dataset viabilitylow_umi_spots <- sum(seurat_obj@meta.data$nCount_Spatial < 500)
low_feature_spots <- sum(seurat_obj@meta.data$nFeature_Spatial < 250)# Spots with <500 UMI:
paste(low_umi_spots, "(", round(100*low_umi_spots/ncol(seurat_obj), 1), "%)")## [1] "2 ( 0.1 %)"
# Spots with <250 features:
paste(low_feature_spots, "(", round(100*low_feature_spots/ncol(seurat_obj), 1), "%)")## [1] "2 ( 0.1 %)"
Verdict: 2 spots have low UMI and features. Consider for filtering.
This measures gene detection efficiency per spot.
What it means: Average number of genes detected per spot
data.frame(
`**Mean**`= round(mean(seurat_obj@meta.data$nFeature_Spatial), 2)
, `**Median**`= round(median(seurat_obj@meta.data$nFeature_Spatial), 2)
, `**Min**`=min(seurat_obj@meta.data$nFeature_Spatial)
, `**Max**`=max(seurat_obj@meta.data$nFeature_Spatial)
, `**Standard deviation**`= round(sd(seurat_obj@meta.data$nFeature_Spatial), 2)
, check.names = F
) %>% t() %>% knitr::kable()| Mean | 5015.13 |
| Median | 5173.00 |
| Min | 159.00 |
| Max | 8447.00 |
| Standard deviation | 1379.98 |
Verdict: We see high average genes per spot. As long as this is highly correlated with average UMI per spot we can interpret this as stxBrain being a dense deeply sequenced dataset.
This analyzes how many spots each gene is detected in.
What it means: Distribution of gene detection across spots
# Sum up number of spots where a gene has non-zero count
gene_detection_rates <- Matrix::rowSums(GetAssayData(seurat_obj, slot = "counts") > 0)
data.frame(
`**Genes detected in >50% of spots:**` = sum(gene_detection_rates > ncol(seurat_obj)/2)
, `**Genes detected in >10% of spots:**` = sum(gene_detection_rates > ncol(seurat_obj)/10)
, `**Genes detected in <5% of spots:**` = sum(gene_detection_rates < ncol(seurat_obj)/20)
, check.names=F
) %>% t() %>% knitr::kable()| Genes detected in >50% of spots: | 4099 |
| Genes detected in >10% of spots: | 10781 |
| Genes detected in <5% of spots: | 18883 |
Verdict: Housekeeping genes make up smallest group, gene detection seems normal.
True test of whether average UMI and gene per spot makes sense, is to see if the two values are properly correctly.
What it means: Correlation between UMI counts and number of genes detected
umi_feature_cor <- cor(seurat_obj@meta.data$nCount_Spatial, seurat_obj@meta.data$nFeature_Spatial)
print(paste("Correlation coefficient:", round(umi_feature_cor, 3)))## [1] "Correlation coefficient: 0.955"
Verdict: Strong positive correlation means we have a high quality deeply phenotyped dataset
This identifies spots with unusual UMI or feature counts using statistical methods.
What it means: Spots with values outside 1.5×IQR (interquartile range) from quartiles
# UMI outliers using 1.5 * IQR rule
Q1_umi <- quantile(seurat_obj@meta.data$nCount_Spatial, 0.25)
Q3_umi <- quantile(seurat_obj@meta.data$nCount_Spatial, 0.75)
IQR_umi <- Q3_umi - Q1_umi
umi_outliers <- sum(seurat_obj@meta.data$nCount_Spatial < (Q1_umi - 1.5*IQR_umi) |
seurat_obj@meta.data$nCount_Spatial > (Q3_umi + 1.5*IQR_umi))
# Feature outliers
Q1_feat <- quantile(seurat_obj@meta.data$nFeature_Spatial, 0.25)
Q3_feat <- quantile(seurat_obj@meta.data$nFeature_Spatial, 0.75)
IQR_feat <- Q3_feat - Q1_feat
feat_outliers <- sum(seurat_obj@meta.data$nFeature_Spatial < (Q1_feat - 1.5*IQR_feat) |
seurat_obj@meta.data$nFeature_Spatial > (Q3_feat + 1.5*IQR_feat))
# UMI outliers (1.5×IQR rule)
paste(umi_outliers, "spots (", round(100*umi_outliers/ncol(seurat_obj), 1), "%)")## [1] "14 spots ( 0.5 %)"
# Feature outliers (1.5×IQR rule)
print(paste(feat_outliers, "spots (", round(100*feat_outliers/ncol(seurat_obj), 1), "%)"))## [1] "13 spots ( 0.5 %)"
Verdict: Very few spots have abnormal UMI / feature counts (less than 0.5%) .
What it means: Percentage of zero values in the expression matrix
total_possible_counts <- nrow(seurat_obj) * ncol(seurat_obj)
actual_nonzero_counts <- sum(GetAssayData(seurat_obj, slot = "counts") > 0)
sparsity <- 1 - (actual_nonzero_counts / total_possible_counts)## [1] "83.8 %"
## [1] "14,167,739"
## [1] "87,724,725"
Verdict: Observed is within normal range (85-95%)
These plots help you visually assess data quality and identify potential issues that might not be obvious from summary statistics alone.
Red flags:
# Distribution of UMI counts
ggplot(
seurat_obj@meta.data # metadata
, aes(x = nCount_Spatial) # UMI counts
) +
geom_histogram(
bins = 30 # number of bins
, fill = "skyblue" # fill color
, alpha = 0.7 # transparency
) +
geom_vline(
xintercept = mean(seurat_obj@meta.data$nCount_Spatial) # mean line
, color = "red" # line color
, linetype = "dashed" # line type
) +
labs(
title = "Distribution of UMI Counts" # title
, x = "UMI Counts per Spot" # x-axis label
, y = "Number of Spots" # y-axis label
) +
theme_minimal()
Verdict: Bell-shaped with moderate spread. Peak is
higher than expected due to deep high quality sequencing
Red flags:
# Distribution of features
ggplot(
seurat_obj@meta.data # metadata
, aes(x = nFeature_Spatial) # feature counts
) +
geom_histogram(
bins = 30 # number of bins
, fill = "lightcoral" # fill color
, alpha = 0.7 # transparency
) +
geom_vline(
xintercept = mean(seurat_obj@meta.data$nFeature_Spatial) # mean line
, color = "red" # line color
, linetype = "dashed" # line type
) +
labs(
title = "Distribution of Features" # title
, x = "Features per Spot" # x-axis label
, y = "Number of Spots" # y-axis label
) +
theme_minimal()
Verdict: Bell shaped and mirroring UMI distribution. No
red flags.
Red flags:
# UMI vs Features correlation plot
ggplot(
seurat_obj@meta.data # metadata
, aes(x = nCount_Spatial, y = nFeature_Spatial) # UMI vs features
) +
geom_point(
alpha = 0.7 # transparency
, color = "darkblue" # point color
) +
geom_smooth(
method = "lm" # linear regression
, color = "red" # line color
, se = TRUE # show confidence interval
) +
labs(
title = paste("UMI vs Features Correlation (r =", round(umi_feature_cor, 3), ")")
, x = "UMI Counts" # x-axis label
, y = "Number of Features" # y-axis label
) +
theme_minimal()
Verdict: Clear positive relationship. NOTE: Might want
to check image to understand why there is a curve.
Red flags:
# Spatial distribution of UMI counts
SpatialFeaturePlot(seurat_obj, features = "nCount_Spatial") +
theme(legend.text = element_text(size = 5))Dying or stressed cells lose cytoplasmic mRNA but retain mitochondrial transcripts. Therefore we can use excess mitochondrial gene expression to detect tissue degradation, while keeping in mind any caveats for spatial data.
Unlike single-cell data, spatial omics adds tissue architecture context:
# Calculate mitochondrial gene percentage
seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-|^mt-")
# Visualize QC metrics
p1 <- VlnPlot(seurat_obj
, features = c("nFeature_Spatial", "nCount_Spatial" , "percent.mt")
, ncol = 3
, pt.size = 0.1
, layer = "counts"
) &
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())# Visualize where these spots are located
# Identify high-mito spots
high_mito_spots <- seurat_obj$percent.mt >= 30
table(high_mito_spots)## high_mito_spots
## FALSE TRUE
## 2772 53
# Visualize where these spots are located
SpatialDimPlot(seurat_obj, cells.highlight = WhichCells(seurat_obj)[high_mito_spots],
cols = c("lightgray", "red"))
Verdict Mitochrondrial genes with count > 30 seem to
occur only on the edges. Combine this information with abnormally low
UMI/feature counts to filter out low quality spots.
We use SCTransform instead of LogNormalize as you would with single cell RNA-Seq. This normalises and scales the data and finds variable genes.
Important Note: For spatial transcriptomics data, Seurat recommends using SCTransform instead of LogNormalize because “standard approaches (such as the LogNormalize() function), which force each data point to have the same underlying ‘size’ after normalization, can be problematic” for spatial data where “the variance in molecular counts across spots is not just technical in nature, but also is dependent on the tissue anatomy”.
Why SCTransform for Spatial Data:
# Increase the memory limit to accommodate your dataset
plan("sequential")
# Modern preprocessing workflow for spatial transcriptomics using SCTransform
# This replaces the old workflow of NormalizeData + FindVariableFeatures + ScaleData
# SCTransform normalization (recommended for spatial data)
seurat_obj <- SCTransform(
seurat_obj # Seurat object
, assay = assayofInterest # input assay name
, method = "poisson" # use poisson model for spatial data
, verbose = FALSE # reduce output
)
# Set default assay to SCT for downstream analysis
DefaultAssay(seurat_obj) <- "SCT"
plan("multisession", workers = 4)PCA, UMAP, k-nearest neighbor, UMAP
# Note: We use the SCT assay for all downstream steps
# 4. Perform PCA
seurat_obj <- RunPCA(
seurat_obj # Seurat object
, assay = "SCT" # use SCT assay
, verbose = FALSE # reduce output
)
# 5. Find neighbors and clusters
seurat_obj <- FindNeighbors(
seurat_obj # Seurat object
, reduction = "pca" # use PCA for neighbor finding
, dims = 1:30 # use first 30 PCs
)
# Force sequential processing (no parallelization)
plan("sequential")
# Finding Gene Clusters
# Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters.
seurat_obj <- FindClusters(
seurat_obj
#, resolution = 0.5
, verbose = FALSE
)
# 6. Run UMAP
seurat_obj <- RunUMAP(
seurat_obj # Seurat object
, reduction = "pca" # use PCA for UMAP
, dims = 1:30 # use first 10 PCs
, verbose = FALSE # reduce output
)## An object of class Seurat
## 48551 features across 2825 samples within 2 assays
## Active assay: SCT (17498 features, 3000 variable features)
## 3 layers present: counts, data, scale.data
## 1 other assay present: Spatial
## 2 dimensional reductions calculated: pca, umap
## 1 spatial field of view present: anterior2
We can visualise identified clusters in UMAP space or directly on the tissue slice.
p1 <- DimPlot(seurat_obj, reduction = "umap", label = TRUE)
p2 <- SpatialDimPlot(seurat_obj, label = TRUE, label.size = 3)
p1 | p2SpatialDimPlot(seurat_obj, cells.highlight = CellsByIdentities(object = seurat_obj, idents = c(9, 6, 2)), facet.highlight = TRUE, ncol = 3)
Verdict: Spatial Clusters seem to align strongly with
specific tissue regions (see region 9)